174 research outputs found

    Discussion on shipping dangerous goods accidents rescue of China

    Get PDF

    Discovering weak community structures in large biological networks

    Get PDF
    Identifying intrinsic structures in large networks is a fundamental problem in many fields, such as biology, engineering and social sciences. Motivated by biology applications, in this paper we are concerned with identifying community structures, which are densely connected sub-graphs, in large biological networks. We address several critical issues for finding community structures. First, biological networks directly constructed from experimental data often contain spurious edges and may also miss genuine connections. As a result, community structures in biological networks are often weak. We introduce simple operations to capture local neighborhood structures for identifying weak communities. Second, we consider the issue of automatically determining the most appropriate number of communities, a crucial problem for all clustering methods. This requires to properly evaluate the quality of community structures. We extend an existing work of a modularity function for evaluating community structures to weighted graphs. Third, we propose a spectral clustering algorithm to optimize the modularity function, and a greedy partitioning method to approximate the first algorithm with much reduced running time. We evaluate our methods on many networks of known structures, and apply them to three real-world networks that have different types of network communities: a yeast protein-protein interaction network, a co-expression network of yeast cell-cycle genes, and a collaboration network of bioinformaticians. The results show that our methods can find superb community structures and the correct numbers of communities. Our results reveal several interesting network structures that have not been reported previously

    Discovering Functional Modules by Clustering Gene Co-expression Networks

    Get PDF
    Identification of groups of functionally related genes from high throughput gene expression data is an important step towards elucidating gene functions at a global scale. Most existing approaches treat gene expression data as points in a metric space, and apply conventional clustering algorithms to identify sets of genes that are close to each other in the metric space. However, they usually ignore the topology of the underlying biological networks. In this paper, we propose a network-based clustering method that is biologically more realistic. Given a gene expression data set, we apply a rank-based transformation to obtain a sparse co-expression network, and use a novel spectral clustering algorithm to identify natural community structures in the network, which correspond to gene functional modules. We have tested the method on two large-scale gene expression data sets in yeast and Arabidopsis, respectively. The results show that the clusters identified by our method on these datasets are functionally richer and more coherent than the clusters from the standard k-means clustering algorithm

    A Top-Performing Algorithm for the DREAM3 Gene Expression Prediction Challenge

    Get PDF
    A wealth of computational methods has been developed to address problems in systems biology, such as modeling gene expression. However, to objectively evaluate and compare such methods is notoriously difficult. The DREAM (Dialogue on Reverse Engineering Assessments and Methods) project is a community-wide effort to assess the relative strengths and weaknesses of different computational methods for a set of core problems in systems biology. This article presents a top-performing algorithm for one of the challenge problems in the third annual DREAM (DREAM3), namely the gene expression prediction challenge. In this challenge, participants are asked to predict the expression levels of a small set of genes in a yeast deletion strain, given the expression levels of all other genes in the same strain and complete gene expression data for several other yeast strains. I propose a simple -nearest-neighbor (KNN) method to solve this problem. Despite its simplicity, this method works well for this challenge, sharing the “top performer” honor with a much more sophisticated method. I also describe several alternative, simple strategies, including a modified KNN algorithm that further improves the performance of the standard KNN method. The success of these methods suggests that complex methods attempting to integrate multiple data sets do not necessarily lead to better performance than simple yet robust methods. Furthermore, none of these top-performing methods, including the one by a different team, are based on gene regulatory networks, which seems to suggest that accurately modeling gene expression using gene regulatory networks is unfortunately still a difficult task

    CAGER: classification analysis of gene expression regulation using multiple information sources

    Get PDF
    BACKGROUND: Many classification approaches have been applied to analyzing transcriptional regulation of gene expressions. These methods build models that can explain a gene's expression level from the regulatory elements (features) on its promoter sequence. Different types of features, such as experimentally verified binding motifs, motifs discovered by computer programs, or transcription factor binding data measured with Chromatin Immunoprecipitation (ChIP) assays, have been used towards this goal. Each type of features has been shown successful in modeling gene transcriptional regulation under certain conditions. However, no comparison has been made to evaluate the relative merit of these features. Furthermore, most publicly available classification tools were not designed specifically for modeling transcriptional regulation, and do not allow the user to combine different types of features. RESULTS: In this study, we use a specific classification method, decision trees, to model transcriptional regulation in yeast with features based on predefined motifs, automatically identified motifs, ChlP-chip data, or their combinations. We compare the accuracies and stability of these models, and analyze their capabilities in identifying functionally related genes. Furthermore, we design and implement a user-friendly web server called CAGER (Classification Analysis of Gene Expression Regulation) that integrates several software components for automated analysis of transcriptional regulation using decision trees. Finally, we use CAGER to study the transcriptional regulation of Arabidopsis genes in response to abscisic acid, and report some interesting new results. CONCLUSION: Models built with ChlP-chip data suffer from low accuracies when the condition under which gene expressions are measured is significantly different from the condition under which the ChIP experiment is conducted. Models built with automatically identified motifs can sometimes discover new features, but their modeling accuracies may have been over-estimated in previous studies. Furthermore, models built with automatically identified motifs are not stable with respect to noises. A combination of ChlP-chip data and predefined motifs can substantially improve modeling accuracies, and is effective in identifying true regulons. The CAGER web server, which is freely available at , allows the user to select combinations of different feature types for building decision trees, and interact with the models graphically. We believe that it will be a useful tool to facilitate the discovery of gene transcriptional regulatory networks

    An Iterative Loop Matching Approach to the Prediction of RNA Secondary Structures with Pseudoknots

    Get PDF
    Motivation: Pseudoknots have generally been excluded from the prediction of RNA secondary structures due to the difficulty in modeling and complexity in computing. Although several dynamic programming algorithms exist for the prediction of pseudoknots using thermodynamic approaches, they are neither reliable nor efficient. On the other hand, comparative methods are more reliable, but are often done in an ad hoc manner and require expert intervention. Maximum weighted matching (Tabaska et. al, Bioinformatics, 14:691-9, 1998), an algorithm for pseudoknot prediction with comparative analysis, suffers from low prediction accuracy in many cases. Here we present an algorithm, iterative loop matching, for predict-ing RNA secondary structures including pseudoknots reliably and efficiently. The method can utilize either thermodynamic or comparative information or both, thus is able to predict for both aligned sequences and individual sequences. Results: We have tested the algorithm on a number of RNA families, including both structures with and without pseudoknots. Using 8–12 homologous sequences, the algorithm correctly identifies more than 90% of base-pairs for short sequences and 80% overall. It correctly predicts nearly all pseudoknots. Furthermore, it produces very few spurious base-pairs for sequences without pseudoknots. Comparisons show that our algorithm is both more sensitive and more specific than the maximum weighted matching method. In addition, our algorithm has high prediction accuracy on individual sequences, comparable to the PKNOTS algorithm (Rivas & Eddy, J Mol Biol, 285:2053-68, 1999), while using much less computational resources. Availability: The program has been implemented in ANSI C and is freely available for academic use at http://www.cse.wustl.edu/˜zhang/projects/rna/ilm/

    Discovering Transcriptional Regulatory Rules from Gene Expression and TF-DNA Binding Data by Decision Tree Learning

    Get PDF
    Background: One of the most promising but challenging task in the post-genomic era is to reconstruct the transcriptional regulatory networks. The goal is to reveal, for each gene that responds to a certain biological event, which transcription factors affect its transcription, and how several transcription factors coordinate to accomplish specific regulations. Results: Here we propose a supervised machine learning approach to address these questions. We build decision trees to associate the expression level of a gene with the transcription factor binding data of its promoter. From the decision trees, we extract regulatory rules that specify how the binding of a combination of several transcription factors affects the expression of a gene. Such rules are easy to interpret, and represent experimentally testable hypotheses. We use a decision tree ensemble approach to increase modeling accuracy and robustness. We also propose a novel method to integrate rules learned from several time series that measure the same biological processes. We apply our method to publicly available cell cycle expression data and transcription factor binding data for the budding yeast. Cross-validation experiments show that our method is highly accurate and reliable. The method correctly identifies all major known yeast cell cycle transcription factors, and assigns them into appropriate cell cycle phases. It also explicitly reveals synergetic relationships of transcription factors, most of which agree well with existing literatures, while the rest provide testable biological hypotheses. Conclusions: The high accuracy of our method indicates that our method is valid and that the learned regulatory rules can be used as the basic building elements of a transcriptional regulatory network. As more and more gene expression and TF binding data are available, we believe that our method will be useful for reconstructing large scale transcriptional regulatory networks

    A Novel Multiple Classifier Generation and Combination Framework Based on Fuzzy Clustering and Individualized Ensemble Construction

    Full text link
    Multiple classifier system (MCS) has become a successful alternative for improving classification performance. However, studies have shown inconsistent results for different MCSs, and it is often difficult to predict which MCS algorithm works the best on a particular problem. We believe that the two crucial steps of MCS - base classifier generation and multiple classifier combination, need to be designed coordinately to produce robust results. In this work, we show that for different testing instances, better classifiers may be trained from different subdomains of training instances including, for example, neighboring instances of the testing instance, or even instances far away from the testing instance. To utilize this intuition, we propose Individualized Classifier Ensemble (ICE). ICE groups training data into overlapping clusters, builds a classifier for each cluster, and then associates each training instance to the top-performing models while taking into account model types and frequency. In testing, ICE finds the k most similar training instances for a testing instance, then predicts class label of the testing instance by averaging the prediction from models associated with these training instances. Evaluation results on 49 benchmarks show that ICE has a stable improvement on a significant proportion of datasets over existing MCS methods. ICE provides a novel choice of utilizing internal patterns among instances to improve classification, and can be easily combined with various classification models and applied to many application domains

    Variations in the transcriptome of Alzheimer's disease reveal molecular networks involved in cardiovascular diseases

    Get PDF
    Analysis of microarray data reveals extensive links between Alzheimer’s disease and cardiovascular diseases
    corecore